Search CORE

19 research outputs found

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

Author: Borgs Christian
Chayes Jennifer T.
Yaghi Omar M.
Zhang Oufan
Zheng Zhiling
Publication venue
Publication date: 19/07/2023
Field of study

We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines.Comment: Published on Journal of the American Chemical Society (2023); 102 pages (18-page manuscript, 84 pages of supporting information

arXiv.org e-Print Archive

Learning to Evolve Structural Ensembles of Unfolded and Disordered Proteins Using Experimental Solution Data

Author: Forman-Kay Julie D
Haghighatlari Mojtaba
Head-Gordon Teresa
Li Jie
Liu Zi-Hao
Namini Ashley
Teixeira Joao Miguel Correia
Zhang Oufan
Publication venue
Publication date: 24/07/2022
Field of study

We have developed a Generative Recurrent Neural Networks (GRNN) that learns the probability of the next residue torsions $X_{i+1}=\ [\phi_{i+1},\psi_{i+1},\omega _{i+1}, \chi_{i+1}]

from the previous residue in the sequence

X_i$ to generate new IDP conformations. In addition, we couple the GRNN with a Bayesian model, X-EISD, in a reinforcement learning step that biases the probability distributions of torsions to take advantage of experimental data types such as J-couplingss, NOEs and PREs. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between structures and data improves upon existing approaches that simply reweight static structural pools for disordered proteins. Instead the GRNN "DynamICE" model learns to physically change the conformations of the underlying pool to those that better agree with experiment

arXiv.org e-Print Archive

Women Stereotypes in Shi Zhecun's Short Stories

Author: Chen Yu-Shih
Christopher Rosenmeier
Croll
Dikötter
Dooling Amy D.
Genette Gérard
Gálik Marián
Hershatter Gail
Lee Leo Ou-Fan [Li Oufan]
Link Perry
Shih Shu-Mei
Tang Wenyi
WU Yonggang
Yeh Wen-Hsin
Zhang Henshui
Zhang Yingjin
Publication venue: 'SAGE Publications'
Publication date: 01/01/2011
Field of study

Crossref

Edinburgh Research Explorer

Learning Correlations between Internal Coordinates to Improve 3D Cartesian Coordinates for Proteins.

Author: Zhang Oufan,
Publication venue
Publication date: 04/10/2023
Field of study

Ezid

Recommended from our members

Learning Correlations between Internal Coordinates to Improve 3D Cartesian Coordinates for Proteins.

Author: Forman-Kay Julie
Head-Gordon Teresa
Lee Seokyoung
LI JIE
Liu Zi
Namini Ashley
Teixeira João
Zhang Oufan
Publication venue: eScholarship, University of California
Publication date: 25/07/2023
Field of study

We consider a generic representation problem of internal coordinates (bond lengths, valence angles, and dihedral angles) and their transformation to 3-dimensional Cartesian coordinates of a biomolecule. We show that the internal-to-Cartesian process relies on correctly predicting chemically subtle correlations among the internal coordinates themselves, and learning these correlations increases the fidelity of the Cartesian representation. We developed a machine learning algorithm, Int2Cart, to predict bond lengths and bond angles from backbone torsion angles and residue types of a protein, which allows reconstruction of protein structures better than using fixed bond lengths and bond angles or a static library method that relies on backbone torsion angles and residue types in a local environment. The method is able to be used for structure validation, as we show that the agreement between Int2Cart-predicted bond geometries and those from an AlphaFold 2 model can be used to estimate model quality. Additionally, by using Int2Cart to reconstruct an IDP ensemble, we are able to decrease the clash rate during modeling. The Int2Cart algorithm has been implemented as a publicly accessible python package at https://github.com/THGLab/int2cart

eScholarship - University of California

Recommended from our members

Protein Dynamics to Define and Refine Disordered Protein Ensembles

Author: Forman-Kay Julie D
Gradinaru Claudiu C
Haghighatlari Mojtaba
Head-Gordon Teresa
Li Jie
Namini Ashley
Naullage Pavithra M
Teixeira João MC
Zhang Oufan
Publication venue: eScholarship, University of California
Publication date: 10/03/2022
Field of study

Intrinsically disordered proteins and unfolded proteins have fluctuating conformational ensembles that are fundamental to their biological function and impact protein folding, stability, and misfolding. Despite the importance of protein dynamics and conformational sampling, time-dependent data types are not fully exploited when defining and refining disordered protein ensembles. Here we introduce a computational framework using an elastic network model and normal-mode displacements to generate a dynamic disordered ensemble consistent with NMR-derived dynamics parameters, including transverse R2 relaxation rates and Lipari-Szabo order parameters (S2 values). We illustrate our approach using the unfolded state of the drkN SH3 domain to show that the dynamical ensembles give better agreement than a static ensemble for a wide range of experimental validation data including NMR chemical shifts, J-couplings, nuclear Overhauser effects, paramagnetic relaxation enhancements, residual dipolar couplings, hydrodynamic radii, single-molecule fluorescence Förster resonance energy transfer, and small-angle X-ray scattering

eScholarship - University of California

A benchmark dataset for Hydrogen Combustion.

Author: Bertels Luke
Das Akshaya
Guan Xingyi
Haghighatlari Mojtaba
Hao Hongxia
Head-Gordon Martin
Head-Gordon Teresa
Heidar-Zadeh Farnaz
Leven Itai
Li Jie
Liu Meili
Stein Christopher J
Zhang Oufan
Publication venue: eScholarship, University of California
Publication date: 01/05/2022
Field of study

The generation of reference data for deep learning models is challenging for reactive systems, and more so for combustion reactions due to the extreme conditions that create radical species and alternative spin states during the combustion process. Here, we extend intrinsic reaction coordinate (IRC) calculations with ab initio MD simulations and normal mode displacement calculations to more extensively cover the potential energy surface for 19 reaction channels for hydrogen combustion. A total of ∼290,000 potential energies and ∼1,270,000 nuclear force vectors are evaluated with a high quality range-separated hybrid density functional, ωB97X-V, to construct the reference data set, including transition state ensembles, for the deep learning models to study hydrogen combustion reaction

PubMed Central

eScholarship - University of California

IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States.

Author: Forman-Kay Julie D
Haghighatlari Mojtaba
Head-Gordon Teresa
Krzeminski Mickaël
Li Jie
Liu Zi Hao
Namini Ashley
Shamandy Alaa A
Teixeira João MC
Vernon Robert M
Yu Lei
Zhang Oufan
Publication venue: eScholarship, University of California
Publication date: 01/09/2022
Field of study

The power of structural information for informing biological mechanisms is clear for stable folded macromolecules, but similar structure-function insight is more difficult to obtain for highly dynamic systems such as intrinsically disordered proteins (IDPs) which must be described as structural ensembles. Here, we present IDPConformerGenerator, a flexible, modular open-source software platform for generating large and diverse ensembles of disordered protein states that builds conformers that obey geometric, steric, and other physical restraints on the input sequence. IDPConformerGenerator samples backbone phi (φ), psi (ψ), and omega (ω) torsion angles of relevant sequence fragments from loops and secondary structure elements extracted from folded protein structures in the RCSB Protein Data Bank and builds side chains from robust Monte Carlo algorithms using expanded rotamer libraries. IDPConformerGenerator has many user-defined options enabling variable fractional sampling of secondary structures, supports Bayesian models for assessing the agreement of IDP ensembles for consistency with experimental data, and introduces a machine learning approach to transform between internal and Cartesian coordinates with reduced error. IDPConformerGenerator will facilitate the characterization of disordered proteins to ultimately provide structural insights into these states that have key biological functions

PubMed Central

eScholarship - University of California

Postrevolutionary Leftovers

Author: [ChuangTzu]
[Confucius]
[Mencius]
[Mencius]
[Motse]
[Motse]
Ahmad
Ahmad
Ailing
Allan
Anderson
Anderson
Ang
Ang
Anyi
Anyi
Ayres
Bakhtin
Bakhtin
Barthes
Barthes
Beijing shifan daxue zhongwenxi (Beijing Normal University Department of Chinese)
Benjamin
Benjamin
Benjamin
Benjamin
Bhabha
Blinde
Brenkman
Brown
Brown
Brown
Buwei
Cao
Capra
Capra
Chan
Chang
Chang
Chen Pingyuan
Cheng
Cheng
Cheng
Cheng
Cheng’en
Cheng’en
Cheng’en
Chin
Chin
Chow
Chow
Chu
Congwen
Conrad
de Man
Delezelova-Velingerova
Denton
Derrida
Dewei
Dewei
Ding
Ding
Ding
Ding
Ding
Dirlik
Douglas
Duke
Eagleton
Eco
Ellmann
Farquhar
Fei
Fengzhu
Frank
Freud
Freud
Gaizun
Gare
Gates
Gates
Gates
Goldblatt
Goldblatt
Goodman
Grey
Guanzhong
Guanzhong
Harris
Haslam
Heqing
Hong
Hong
Hong
Hong
Hongzhen
Hongzhen
Hsia
Hua
Hua
Hua
Hualing
Hualing
Huang Ziping
Hulme
Huters
Huters
Huters
Jameson
Jameson
Jameson
Jameson
Jameson
Jameson
Jameson
Jeanneret
Ji
Ji
Jingzhi
Jingzhi
Jinhua
Jizhi
Kane
Ke
Kilgour
Kim
Kim
Kingston
Kingston
Kinkley
Knechtges
Kuide
Lan
Larson
Larson
Lau
Lee
Lee
Lin
Ling
Ling
Liqun
Liu
Louie
Lowe
Lyell
Lévi-Strauss
MacQueen
Maike
Mair
Meng
Meng
Meng
Metz
Metzger
Miaozi
Miller
Mote
Nai’an
Neng
Nieh
Oufan
Owen
Pingyuan
Plaks
Plaks
Prusek
Qian
Qian
Qing
Qing
Qiuming
Qiuming
Qiuming
Rees
Rickett
Rong
Rong
Ruowang
Ruowang
Ruowang
Ruowang
Ryan
Said
Sanborn
Sanday
Saussy
Schafer
Sessions
Shengxiu
Shige juan
Shih
Shuangtian
Shuqing
Sihe
Simmons
Smith
Spence
Spencer
Su
Takaki
Tan
Tan
Tan
Tang
Tannahill
Tzu]
Tzu]
Tzu]
Wang
Wang
Wang
Wang
Wang
Wenfu
Wenfu
Wenfu
Wenyibao
West
White
White
Wong
Wong
Wong
Wong
Xianliang
Xianliang
Xianliang
Xiaobo
Xiaoming
Xiaoming
Xiguang
Xun
Xun
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yanchi
Yang
Yang
Yanxiang
Yi
Yi
Yi
Yizi
Yu
Yu
Yu
Yuanlun
Yushi
Yutang
Yü
Zedong
Zedong
Zedong
Zehou
Zehou
Zehou
Zha
Zhang
Zhang
Zhaoyang
Zhenyun
Ziping
Zongfa
Zuoren
Zuoren
Publication venue: 'Duke University Press'
Publication date: 01/01/1999
Field of study

Crossref

Notes

Author: [ChuangTzu]
[Confucius]
[Mencius]
[Mencius]
[Motse]
[Motse]
Ahmad
Ahmad
Ailing
Allan
Anderson
Anderson
Ang
Ang
Anyi
Anyi
Ayres
Bakhtin
Bakhtin
Barthes
Barthes
Beijing shifan daxue zhongwenxi (Beijing Normal University Department of Chinese)
Benjamin
Benjamin
Benjamin
Benjamin
Bhabha
Blinde
Brenkman
Brown
Brown
Brown
Buwei
Cao
Capra
Capra
Chan
Chang
Chang
Chen Pingyuan
Cheng
Cheng
Cheng
Cheng
Cheng’en
Cheng’en
Cheng’en
Chin
Chin
Chow
Chow
Chu
Congwen
Conrad
de Man
Delezelova-Velingerova
Denton
Derrida
Dewei
Dewei
Ding
Ding
Ding
Ding
Ding
Dirlik
Douglas
Duke
Eagleton
Eco
Ellmann
Farquhar
Fei
Fengzhu
Frank
Freud
Freud
Gaizun
Gare
Gates
Gates
Gates
Goldblatt
Goldblatt
Goodman
Grey
Guanzhong
Guanzhong
Harris
Haslam
Heqing
Hong
Hong
Hong
Hong
Hongzhen
Hongzhen
Hsia
Hua
Hua
Hua
Hualing
Hualing
Huang Ziping
Hulme
Huters
Huters
Huters
Jameson
Jameson
Jameson
Jameson
Jameson
Jameson
Jameson
Jeanneret
Ji
Ji
Jingzhi
Jingzhi
Jinhua
Jizhi
Kane
Ke
Kilgour
Kim
Kim
Kingston
Kingston
Kinkley
Knechtges
Kuide
Lan
Larson
Larson
Lau
Lee
Lee
Lin
Ling
Ling
Liqun
Liu
Louie
Lowe
Lyell
Lévi-Strauss
MacQueen
Maike
Mair
Meng
Meng
Meng
Metz
Metzger
Miaozi
Miller
Mote
Nai’an
Neng
Nieh
Oufan
Owen
Pingyuan
Plaks
Plaks
Prusek
Qian
Qian
Qing
Qing
Qiuming
Qiuming
Qiuming
Rees
Rickett
Rong
Rong
Ruowang
Ruowang
Ruowang
Ruowang
Ryan
Said
Sanborn
Sanday
Saussy
Schafer
Sessions
Shengxiu
Shige juan
Shih
Shuangtian
Shuqing
Sihe
Simmons
Smith
Spence
Spencer
Su
Takaki
Tan
Tan
Tan
Tang
Tannahill
Tzu]
Tzu]
Tzu]
Wang
Wang
Wang
Wang
Wang
Wenfu
Wenfu
Wenfu
Wenyibao
West
White
White
Wong
Wong
Wong
Wong
Xianliang
Xianliang
Xianliang
Xiaobo
Xiaoming
Xiaoming
Xiguang
Xun
Xun
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yan
Yanchi
Yang
Yang
Yanxiang
Yi
Yi
Yi
Yizi
Yu
Yu
Yu
Yuanlun
Yushi
Yutang
Yü
Zedong
Zedong
Zedong
Zehou
Zehou
Zehou
Zha
Zhang
Zhang
Zhaoyang
Zhenyun
Ziping
Zongfa
Zuoren
Zuoren
Publication venue: 'Duke University Press'
Publication date: 01/01/1999
Field of study

Crossref